Second OLS Assumption
- (\(X_i\), \(Y_i\)), \(i=1, ...n\) are independently and identically distributed (i.i.d)
(\(X_i\), \(Y_i\)), \(i=1, ...n\) are independently distributed
- Knowing that observation i takes on particular values for X and Y tells you nothing about the probability of the next observation taking on particular values for X and Y
- When is assumption of independently distributed violated?
- Longitudinal data: knowing a data point in one time period tells you something about a data point in another time period
- Ex: Amount of $ student received via a Federal Pell Grant freshman year at UA probably tells you something about how much $ student received via a Federal Pell Grant sophmore year at UA
- Hierarchical data:
- Ex: Students are nested within classrooms. Knowing the reading test score of one student in a classroom is probably correlated with reading test score for another student in the same classroom.
(\(X_i\), \(Y_i\)), \(i=1, ...n\) are identically distributed
- Prior to choosing sample of observations for the population, the probability distribution (i.e., the likelihood that \(Y_i\) takes on certain values) is the same for all observation
- This is always true if you take a random sample!
- One randomly selected observation has the same probability of taking on a certain value of \(Y_i\) as another randomly selected observation
- When is the assumption of identically distributed violated?
- Sampling bias:
- Ex: You want to investigate probability of being tardy to college lectures and take a sample of students living on campus. Sampling bias: you did not include commuter students. If you were to randomly select a student living on campus, they are not likely to have the same probability of being tardy to lecture than a randomly selected commuter student.